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. Thxs study develops and tests an instrument to assess 

the fidelity of the intended program, i.e., experimental treatment in 
the evaluation of a preschool program. During the school year 
(1972-73) CSMEEL (Central Midwestern Educational Lab) investigated 
the consequences of different levels of training on implementation o^ 
the Demonstration and Research Center for Early Education (DARCEE) 
program,. Part of this investigation involved three separate ratings 
of the pilot test classrooms with the assessment scale. These ratings 
were given at the beginning, middle, and end of the school year. 
Classrooms with maximum training scored on the average approximately 
10 per cent higher on each of the essentials than did the classes 
with materials only. With comparison classes, however, that 
consistency was lacking. On the essentials of physical setting, unit 
use, and parent involvement the comparison classrooms actually scored 
higherthan the DARCEE group with the maximum training, on two other 
essentials (reinforcement and behavior management and attitude 
development) and on student involvement they scored higher than 
DARCEE classrooms with the minimum training, whereas on the 
essentials of skill development, organization and use of time, 
grouping, teacher roles and responsibilities, and teacher prepa-^ation 
these classes scored lower than both DARCEE classroom treatments. 
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THE DEVELOPME.NT, USE, AND IMPORTANCE OF INSTRUMENTS ^ 

THAT VALIDLY AND RELIABLY ASSESS THE DEGREE 

TO WHICH EXPERIMENTAL PROGRAMS 

ARE implemented' 

Warren Solomon, Daniel Ferrltor, 
Joseph Haern, Edwin Myers 
CEMREL, Inc. 

Over the past several years we have witnessed an almost exponential 
rise In Intervention programs, curriculum materials, and special training 
programs designed to facilitate cognitive, perceptual, psychomotor, and 
social -emotional development In home and school settings. Simultaneously, 
there has been a similar rise In the quantity as well as the quality of ed- 
ucational program evaluations. In many of these evaluations we find In^- 
creased attention focused on the assessment of specific child outcomes 
targeted by the program, or materials within them. 

When evaluations deal primarily with the sets of expected child 
outcomes derived from the program objectives, the Interpretations appear to 
be relatively straightforward. That Is, the evaluator can state that the 
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program's materials, strategies, and training procedures were carred out In 
an effort to attain a given set of objectives and, In fact, attained a cer- 
tain percentage of those objectives. Based on statements such as these, one 
Is tempted to draw conclusions on whether or not the treatment was efflca- 
clous, ''Efficacious treatments" could be defined, for example, as ones in 
which 50, 60, or possibly 70 per cent of the program's objectives were at- 
tained. In point of fact, this hypothetical evaluation strategy may lead to 
erroneous conclusions {Gross, Giaqulnta, 6 Bernstein, 1971, Pp. 3-7). The 
critical factor may be the "were carried out" or implementation dimension. 
The fact that materials and strategies were prescribed does not guarantee 
that the teacher actually engaged children In the Intended way with the pro- 
gram's set of curriculum materials. If throughout the year the teacher did 
not Implement a particular aspect of the program. It is misleading to say 
that the program Is one that Is not able to attain its objectives. Perhaps, 
one Is equally justified in suggesting that the program's training was car- 
ried out poorly. 

This is not to say that the development and use of tests based on 
program objectives are unimportant In program evaluations. The argument, 
rather. Is that such program evaluations are Incomplete (Stake, 1967, p. 5) 
and may suggest unwarranted causal relationships between treatment and hypo- 
thesized outcomes. What is also necessary Is an examination of the extent 
to which the program actually was implemented. 

/The objective of this study was to develop and test an instrument 
to assess the fidelity of the Intended program, i.e., experimental treatment 
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In the evaluation of a preschoo) program- With such an Instrument the eval- 
uator could look carefully at the program as It Is Installed fn different 
sites to examine the degree to which each of the Independent or regulating 
variables defined by the developer as major components are present- With 
the knowledge of the presence or absence of these variables, It Is possible 
to conduct a thorough and fair evaluation of the program's ability to attain 
the outcomes It seeks, as well as to evaluate the training component of thfe 
program. With such data one might find that a program that apparently at- 
tained only 50 per cent of Its prescribed objectives. In fact attained 90 
per cent of the objectives that teachers actually chose and attempted to at- 
tain. Such Information has relevance not only for summatlve evaluations and 
for comparative analyses of the program effectiveness, but for formative eval- 
uations, to provide data to program developers an possible revisions In pro- 
gram specifications and training procedures. 

Why Develop a Degree of Implementation Instrument? 

Some argue that the most economical way to assess how well a pro- 
gram Is Implemented would be to make use of an existing Instrument. If we 
imagine, for example, that a program has as a component the prescription that 
the teacher teach indirectly, then the Interaction analysis system of Flanders 
(i960, pp. 257-265) could be used to help determine the degree of Implementa- 
tion. Or, If the program prescribes that teachers ask many questions that 
call for divergent thinking responses, the Interaction analysis system of 

^he particular preschool program we were evaluating was one devel- 
oped at Peabody College by the Demonstration and Research Center for Early 
Education (DARCEE). In this document the program will be called "The DARCEE 
Program.*' 
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Gallagher and Aschner (1968, pp. 219-133) would seem appropriate. 

Unfortunately, as we attempted to assess degree to which teaching 
teams tn classrooms in selected sites were Implementing the major Independent 
variables of the DARCEE preschool program,^ none of the existing observation- 
al systems served our needs. While they might have provided some Interest- 
ing research data, they would not have shed light on the implementation 
questions felt to be critical in the evaluation. Our solution was to develop 
a new instrument that would answer these qu^jtstions. in particular, we 
wanted an instrument that could assess to wh^rc extent teachers were imple- 
menting the ^^ntlre preschool program. 

Measuring the total program Implementation allows the evaluator 
to gain Information relevant to a number of Issues. First, he can determine 
the extent to which the objectives of the teacher training materials and 
procedures are realized. Second, he can use the instrument to serve the 
formative evaluation role of helping trainers use data to examine their 
training priorities. Third, he can determine which of the program variables 
are harder to Implement than others. And, finally, he can »ietepr,}r.u which 
program variables are most important in attaining child outccmes. That 



-^DARCEE's program process variables have been named by DARCEE the 
program's "'essentials." The essentials are sets of prescriptions organized 
about themes that specify how the teacher is to organize the space, time, 
groupings, and content and specifies how the teacher is to Interact with 
team members and children. One set of prescriptions focused on organization 
of space Is called **the physical setting'"; another set of prescriptions 
focused on how the teacher is to interact with children Is called "behavior 
management and positive reinforcement," and so on. DARCEE specified ten es- 
sentials In the 1971-72 school year and eleven In the 1972-73 school year. 
The latter set Includes "physical setting," "organization and use of time," 
"grouping," "teacher roles and responsibilities," "teacher preparation," 
"materials use, " "attitude development," "behavior management and positive 
reinforcement," "skill development," "unit use," and "parent involvement." 
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ts, he can distinguish between those Independent variables having major ef- 
fects as opposed to minor or negligible effects on the desired child 
outcomes. 

Our decision to develop a new instrument was not unique. GrosSi 
Glaqulnta, and Bernstein (1971) developed such an Instrument In a study of 
the Installation of an Innovative program In an elementary school » as did 
Oliver and Shaver (1966), when they Investigated two styles of teaching 
(socrattc analysis and recitation analysis) in their social studies curric- 
ulum project. 

X Development of the Instrument 

There were several phases in the development of **The DARCEE Class- 
room Assessment Scale.** First, we became familiar with the DARCEE program 
by reading DARCEE documents describing the program, by observing the program 
In operation in many sites, by participating in DARCEE training workshops, 
and by discussing the DARCEE program at length with DARCEE developers and 
trainers. This phase may be called the ^'program familiarization phase.'* 

During the second phase, the "Instrument development phase," we 
used the description of DARCEE^s ten "essentials" (Brown, Dokeckl , O'Connor, 
& Stinson, 1971) and sorted the S5 Items of a classroom checklist previous- 
ly developed by DARCEE using each DARCEE essential as a category. New items 
were then written, and vague Items of the original checklist were clarified 
to make possible reliable scoring. 

The third phase of the instrument development could be called the 
"instrument refinement phase." After the first version was drafted, a 
meeting was held for the development staff to examine and critique the 
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Instrument. Then, following completion of a second draft of the Instrument, 
members of the CEMREL staff and DARCEE training staff observed the develop- 
ment site classroom, as well as classrooms In Ml lie Lacs, Minnesota and 
Macon, Georgia. At least two observers scored the same classroom at the 
same time In an effort to determine the interscorer agreement and further 
specify items to make them more rel lable^ Interscorer agreement scores 
(percentages of agreement) were 70.5 per cent and 83.3 per cent in two MI lie 
Lacs classrooms (February 1972) and were Sk.k per cent and 89.3 per cent in 
two Macon classrooms (April 1972). 

Later, in the spring of 1972, after formulas for computing sub- 
scores that correspond to DARCEE essentials were developed, all DARCEE 
classes and four non-DARCEE classes were visited and scored using the as- 
sessment scale. The findings, summarized below, reveal that the instrument 
was sensitive to differences between DARCEl: and non-DARCEE classrooms on 
many of the subs cores. 

The assessment scale was further revised during the summer of 
1972 to make the instrument one that raters unfamiliar with the DARCEE pre- 
school program could use. In this revised version, the Items Include much 
more descriptive information, terms are defined more precisely, and scoring 
Instructions are detailed. 

From the above description of how the assessment scale was de- 
veloped, It Is clear that (a) the Instrument was developed after evaluators 
studied the program, (b) that the Instrument has been refined using recom- 
mendations of program developers as inputs in an effort to get content vali- 
dation, and (c) the instrument has been modified based on field tests of 
Its use to increase the reliability of items. 
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Nature of the Implementation Instrun^ent 

^ The implementation instrument was designed to assess the extent 

to which various preschool classrooms resemble the ideal DARCEE classroom 
as defined by DARCEE developers. The instrument consists of 95 Items which 
utilize three different measurement strategies: (a) some items are scored 
by observing each of the teachers as they interact with the children or by 
observing the displays and physical arrangements of the classroom, (b) some 
Items are scored by examining documents written by the teachers, and 
(c) some items are scored by ratlogrresponses mace*-b^ the-t6achers -when-Inter- 
viewed. / Figures 1, 2, and 3 show examples of these three scoring techniques. 

Whatever form of measurenfient was employed on any given Item, each 
Item is scored on a three-point scale ranging from 0 to 2 wi th 0 representing 
non-correspondence with the ideal DARCEE classroom, and 2 representing cor- 
respondence v;Ith the DARCEE classroom. Scores on each item contribute to 
one or more subscores which correspond to specific DARCEE essentials. By 
collapsing the 95 items Into subscores, one may examine each classroom with 
regard to the extent to which each DARCEE essential is being implemented In 
the classroom. The subscores are then summarized on a chart showing the 
classroom profile. Figure h is an example of the summarizing profile. 

The assessment scale requires one full day of classroom observa- 
tion starting before the beginning of class and ending only after the 
teachers have completed their daily planning and evaluation meeting, which 
usually occurs after the children have departed for the day. To assure 
the content validity of the subscores, the items and subscore formulas were 
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Figure 1 Sample Observation Item 
Completeness of the Schedule 



The OARCEE classroom schedule Includes a number of specified kfnds of 
activities that are to recur each day* Since the schedules are usually 
posted on the wal)» you will usually need only look at the posted schedule 
to score this Item, If the schedule Is not posted, you could simply take 
note of activities that occur as they occur and check them off on the 
score sheet. The activities that should recur dally are: 



a. At least one large-group activity. [In large group, the entire 
classroom of children sit together to receive Instrumtlon conducted 
usually by the lead teacher,] 

b. One small-group activity. [Small-group activities are conducted by 
teachers teaching groups of four to ten children.] 

c. A second small-group activity. [The description for '^b" applies 
here.} 

d. Structured free choice. [Children are given a period of time to 
participate in an activity or activities they have chosen from a 
limited number of options*] 

e. Heals and/or snacks. 

f. Toileting and washing hands, 

g. Outdoor activity. [Weather permitting, children have some time 
during the daily session to go outside to play. If there Is Inclement 
weather, they have some substitute activity^ usually active games.] 

h. A second large-group meeting near the end of the day. 



On the score sheet, check which of the above Items are part of the dally 
schedule. Then, score: 

(0) If TWO or MORE of the above ITEMS are OMITTED, 

(1) If ONE of the above ITEMS is OMITTED. 

(2) If NONE of the above ITEMS Is OMITTED. 
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Figure 2 Sample Item Scored by Analyzing 
Written Records 



The Number of Lesson Plans 



In the DARCEE classroom prior to the daily session the Lead Teacher 
should have prepared a lesson plan for large group, and she and her 
Assistants should have prepared lesson plans for all of their small 
groups. The definition of ''lesson plan" for this Item ts as follows: 
The lesson plan must be a statement in writing of (d}at least one 
objective and (b) at least one material and strategy to be used for 
the samll- or large-group activity session. 

To score this item, collect all lesson plans and eliminate those that 
do not meet criteria (a) or (b) . Then, score: 

(0) If 2 OR FEWER LESSON PLANS WERE WRITTEN prior to teaching by all 
of the teachers. 

(1) For situations between (O) and (2). 

(2) If ALL OR ALL BUT ONE of the POSSIBLE LESSON PLANS WERE WRITTEN 
PRIOR TO TEACHING. [To figure out how many lesson plans are pos- 
sible, assume that each teacher should have one lesson plan for 
each small group he or she teaches and that In addition the Lead 
Teacher should have a lesson plan for her large-group session. 
For example. In Mrs. Keller's room there are two small-group 
activity sessions and two teachers, including Mrs. Keller. Under 
those circumstances there should be four small-group activity 
lesson plans plus one large-group activity lesson plan, or a 
total of five lesson plans. If there were an additional Assistant 
Teacher as part of Mrs. Keller's team, two additional small-group 
lesson plans should be prepared, making a total of seven lesson 
plans* ] 
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Figure 3 Sample Item Scored from Teacher Interview 



Criteria for Grouping and Regrouping Children I n Small Groups 

In the DACCEE classroom each child Is placed In a small group for dally 
Instructional purposes, meals and/or snacks, and other reasons, Chl ldr^>i> 
are to be grouped and regrouped In their particular small groups on thv. 
basis of two principles: (a) ability (chllcr^n are to be placed In 
groups with children having similar levels of skills) and (b) social 
factors (children are to be placed In groups of children with whom they 
are compatible. Some children high In certain behavior patterns, lake 
following directions, may be placed In groups as role models for others 
to follow. Some children are placed In groups to separate them from 
children whose Influence on their behavior Is negative.) 

To score this Item ask the question In the box below: 



How did you place children In their small groups? [if the 
answer Is too general to score, ask specific questions such 
as, "Why did you place Johnny In Miss Smith's group Instead 
of Mr. Kelso's? Why did you place Annette In the group she 
Is In?" etc.] Do you regroup your children? [if so] how do 
you decide which children to regroup? 



Score: 



(O) If the TEACHER Indicated she 1NSIDERED NEITHER (a) ABILITY FACTORS 
consistent wi'^h the P rE p ^-^m or (b) SOCIAL FACTORS consistent 
*^ the '>^RCEE pro i (see che paragraph, describing DARCEE grouping 



' t^ASON (a) OR (b) BUT NOT BOTH (a) and (b) , OR 
ie- -zltuations between (O) and (2). 

N (a) AND (b) both for grouping and re- 
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Figure k Sample Surnmary Profile 
for Classroom 



Date of Observation 1/1/73 Time of Day: 8:30-2:30 

Site: St, Louts 

Teacher: Mr. Jones Rater: B. Stone 



RATING FORM PROFILE 



Nonagreement with 
DARCEE 



Agreement 
with DARCEE 



Subscore 





1 . Physical Setting 

2. Organ, and Use of Time 
3* Grouplng'tndlv. 

k. Roles of Ts in Thetr Teams 

5. /10. T. Prep/Materials Use 

6. Attitude Development 

7. Reinforcement and Beh. Mgt.- 

8. Ski 1 1 Development 

9. Unit Use 

11. Parent Involvement 
Student Involvement 



































































































Proportion 
Subscore of Agree- 
Average ment Score 



1. US .1 

2. 1 .25 2 

3. U5 3 
k. 2^ h 

5. /10/ U0_ 5 

6. 1.5 6 

7. 2.0 7 

8. 1.5 8 

9. »5 9 
11. ^ 11 
SI 1.75 SI 



.75 
.62 
.87 

VIO. 5^ 
.87 
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examined and modified by the DARCEE preschool development and training 
staff 

Before they are ready to use the instrument independently, raters 
need approximately a day and one-half of trafning, which Includes supervised 
use of the Instrument, 



It should be noted that the assessment scale itself is an Instru- 
ment which measures the degree to which teachers behave In accordance with 
DARCEE teaching principles and as a result the instrument reflects DARCEE's 
assumption that teaching process variables are more important independent 
variables than content so far as child outcomes are concerned. As evalua- 
tors, we were not able to accept only measures of these process variables 
as the sole measurement of degree of implementation. That Is, since there 
was an entire set of child objectives in the cognitive and skill domain, 
there should also be a degree of Implementation measure on whether the 
teachers actually taught the content Implied in the objectives. Therefore, 
the CEMREL evaluation staff not only developed the degree of implementation 
measure discussed in this paper, it also sought to determine the degree to 
which the teachers actually attempted to attain the program's child outcomes. 

To measure this dimension, we, in conjunction with the DARCEE de- 
velopers, designed and produced late in the first year of the field test an 
instrument which the teachers marked dally for each child with regard to 
whether they had attempted to teach particular objectives and whether they 
had been successful. This instrument was developed too late In the year 
to assess the degree to which the teachers attempted to teach the specified 
child objectives. Therefore, at the end of the year a questionnaire was 
designed and administered to each teacher focusing on whether or not she 
had attempted to teach each of the DARCEE behavioral objectives. In our 
causal model we felt that the DARCEE essentials would probably be major in- 
dependent variables for attitude outcomes in the children and minor Inde- 
pendent variables for skill objectives, whereas the v/ork put in on the 
objectives themselves should be the major independent variables for the 
ski 1 1 objectives . 
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The Rellablltty of the tnstrument 

Three types of rellabnity were obtained for the observation 

Instrument. The first type, which we will call Interscorer agreement or 

Interrater reliability, concerns the agreements of different raters ob- 
serving at the same classroom at the same time. This reliability Is 
utilized to estimate the effectiveness of training raters to use the 

Instrument. The Instrument must be reliable In the sense that each rater 
will score similar events In the same way. 

Table 1 presents the design utilized to obtain these reliabil- 
ities. The coefficients for each of the Interscorer agreements In the 
design are presented In Table 2 as proportion of Interscorer Item by Item 
agreements out of total possible agreements. These coefficients rang« 
from 68. A to 97.8 per cent agreement and average to 85.05 per cent 
agreement. 

The second type of reliability +s-also represented In Table 1. 
(ienoted simply as reliability coefflc lentils. reliability measure refers 
to the consistency of the classroom over a short period of time. Two 
raters rated the same classrooms but on different days In close proximity. 
Thus, this coefficient assesses not only Interrater rallabHlty, but also 
the approximate representativeness of a given classroom day with any other 
day within the same time frame, such as a week. ' 

The results of this reliability are presented In Table 3. Most 
of the subscores do not appear to be siibject to d^ily classroom variation 
with the exception of the Organization and Use of Time subscore and the 
Teacher Preparation— Materials Usage subscore. Since In our evaluation 



14 



14 



Table 1 

DESIGN TO DETERMINE RELIABILITIES OF THE DARCEE CLASSROOM 
ASSESSMENT SCALE IN THE FALL OF 1972 



Classroom^ 


Raters 


Kind of Reliability 


1 


2 


3 


4 


5 


A 
M 


11/21 


11/21 


11/21 


11/21 




Int&rscorer Agreement 


B 


n/27 


11/27 


11/27 






Interscorer Agreement 


C 


12/5 


12/5 


12/5 






interscorer Agreement 


D 


12/4 


12/6 








Rellabll Ity Coefficient 


E 


12/6 


12/4 








Rel iabi 1 Ity Coefficient 


F 








11/13 


n/13 


Interscorer Agreement 


6 








11/14 


11/14 


Interscorer Agreement 


H 








11/13 


11/13 


Interscorer Agreement 


1 








11/14 


11/14 


Interscorer Agreement 



Classrooms A-E are located In Louisville, Kentucky. Raters 1-3 
are local residents trained by Rater a CEMREL employee. Classrooms F-l 
are located In Hacon, Georgia. Raters k and 5 are CEMREL employees who 
reside In St» Louts 
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Table 2 

PROPORTION OF INTERSCORER ITEM AGREEMENTS USING THE DARCE? 
CLASSROOM ASSESSMENT SCALE DURING THE FALL OF 1972^ 



Raters 



Classroom 


1-2 


1-3 




2-3 




3-5 


4-5 


A 


70.5 


68.4 


69.5 


77.9 


82.1 


8/».2 




B 


89.5 


92.6 




92.6 










96.8 


96.8 




97.8 








F 














67. '1 


G 














jj'i 


H 














78.9 


1 














82.1 



Scores represent proportion of Items on which there was agtt. 
to total possible agreements. Disagreements on a three-point scale cor 
one- or two-point disagreements, in no case were there more than tiue 
point disagreements out of the 95 possible chances. 

'^C I ass room C is a non-DARCEE comparison classroom. 
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Table 3 

RELIABILITY COEFFICIENTS FOR INSTANCES IN WHICH TWO RATERS 
EACH SCORED THE SAME CUSSROOM ON DIFFERENT 
DAYS IN THE SAME WEEK 



Subscore 



Classroom 
1 



Classroom 
2 



1. Physical Setting 

2. Organization and Use of Time 

3. Grouping 

A. Roles of Teachers in Thetr Teams 

5. Teacher Preparation-Materials Use 

6. Attitude Development 

7. Reinforcement and Behavior Mgt. 

8. Skill Development 

9. Unit Use 

10. Parent Involvement 

11. Student Involvement 



a 


a 


.612 


.802 


• 988 


.716 


.892 


.870 


.36i» 


.716 


.629 


.780 


.693 


.6^H 


.369 


.286 


a 


a 


-,.-a 


.000 


.895 


.8i»7 



^No correlations could be calculated when all 
score had Identical rating and, hence, no variance. 



Items of this sub 
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Table k 

AVERAGE PRODUCT MOVEMENT- COEFFI CI ENTS OF INTERSCORER 
AGREEMENTS BY SUBSCORES^ 



Subscore 


Mean Cor- 
relat'lon'^ 


Ranoe 
of Cor- 
re 1 at I nn^ 


S.D. 
of Cor- 

r A t A t* 1 one 


K 


Physical Setting 


.88 


.58-1.0 


.18 


2. 


Organization and Use of Time 




.76-1.0 


.10 


3. 


Grouping 


.92 


.61-1 .0 


.11 


k. 


Roles of the Teachers Jn Their Teams 


.9o 


.91-1.0 


.03 


5. 


Teacher Preparation-Materials Use 


.97 


.88-1,0 


.Ok 


6. 


Attitude Development 


.86 


.59-. 99 


.]k 


7. 


Reinforcement and Behavior Mgt. 


.65 


.00-1 .0 


.36 


8. 


Ski 1 1 Development 


.97 


.87-1.0 


.05 


9* 


Unit Use 


1 .00 


1.0-1 .0 


.00 


10. 


Parent Involvement 


.Bit 


.00-1 .0 


.30 


11. 


Student Involvement and AttentI veness 


.88 


-39-1.0 


.19 



*Coeff icients of In.terscorer agreements on subscores are defined as 
the correlation coefficients between the items which constitute each subscore. 

^Tj^e number of correlations averaged to form the mean correlation 
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strategy we plan to use the assessment scale at the beginning, middle, and 
end of the school year to determine the extent of Implementation variation 
over the year. It Is Important to estimate v/hich subscores are less sus- 
ceptible to actual teaching content and other daily variations. Additional 
estimates of this variation will be obtained during the school year. 

In computing the third type of reliability we looked at the reli- 
ability of items within each subscore. Periodically, CEMREL sends score 
sheets to the DARCEE training staff so that trainers might determine non- 
correspondence of individual classrooms to DARCEE principles and practices. 
In order to determine the Internal consistency of subscores, item by item 
correlations were computed for each possible pair of observers who rated 
the same classroom at the same time. / The results of this analysis are 
presented In Table 

The only mean corrQlat;3n below a readily acceptable level of 
.80 was for the subscore on Reinforcement and Behavior Management .^^/^h is 
subscore Is the most difficult to rate, especially on the items dealing 
with setting standards and with reinforcement tallies. Apparently, raters 
see and rate approximately the same thing for each of the other subscores. 

Results 

Analysis of 1971-72 DARCEE Classrooms 
Using the Assessment Scale 

In spring of 1971-72 each DARCEE classroom and four non-DARCEE 

classrooms were observed using the assessment scale. Since each class 

was observed only one time and since the assessment scale was still being 

revised the results in Table 5 should be regarded tentatively. They are 
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Table 5 



COMPARISON OF SUBSCORES FOR DARCEE AND NON-DARCEE 
CLASSROOMS OBSERVED DURING THE SPRING OF 1972 







DARCEE 
ClassrooiiK 
(N-15) 


NON-DARCEE 
Classrooms 

(N.4) 




Subscore 


Mean 
Subscore 


Standard 
Deviation 


Mean 
Subscore 


Standard 
Deviation 


1. 


Physical Setting 


.80« 


.16 


.43 


.08 


3. 


Grouping 


.84a 


.12 


.63 


.14 


3. 


Planning anid Evaluation 


.468 


.16 


.13 


.18 


k. 


Teacher Roles and Responsibilities 


.66^ 


.10 


.16 


.07 


5. 


Organization and Use of Time 


.85 


.23 


.69 


.23 


6. 


Unit Approach 


.55 


.28 


.58 


.41 


;. 


Teaching Techniques 


.71^ 


.13 


.54 


.16 


8. 


Parent Involvement 


.29 


.28 


.02 


.07 


9. 


Student Participation 


.86 


.12 


.84 


.21 


Mean of Subscores 


.67^ 




.44 





NOTE: Subscorfts are reported here as proportions of agreement scores with 

0 representing non- correspondence with DARCEE and 1 representing cor- 
respondence. These scores were obtained simply by dividing the 
actual subscores by 2. 

^On a !-taMed t test for groups with Independent means DARCEE 
classes scored significantly higher than non-DARCEE classes on these scores 
(p < .05). 
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shown mainly to indicate how the instruments could be used tp analyze the 
degree of Implementation. 

As Table 5 indicates the DARCEE classes did show greater cor- 
respondence with DARCEE principled and procedures. Examination of subscores 
reveals, however, that the DARCEE classes did not significantly exceed non- 
DARCEE classes on all subscores. They did exceed the non-DARCEE classes 
on five subscores {p.< .05) (physical setting, grouping, planning and 
evaluation, teacher roles and responsibilities, and teaching techniques). 
On the parent involverrient subscore the DARCEE classrooms were rated higher 
than the non-DARCEE class but this difference was not significant. Of thc- 
three other subscores (organization and use of time, unit approach* and 
student participation) there appeared to be only small differences betwc.» ^ 
the DARCEE and non-DARCEE classes. As Table 5 shows, four subscores wero 
Implemented at a level of .80 or higher, whereas only two subscores were 
Implemented at a level lower than .50. Examination of the particular sub- 
scores involved reveals that the DARCEE teachers had greatest success in 
obtaining student participation, setting up daily schedules, organizing 
their classroom space, and grouping their children In a manner consisten*: 
with DARCEE* s prescriptions, they had moderate succes^ In assuming appro- 
priate DARCEE roles and responsibilities. In using DARCEE teaching 
techniques and the unit dpproacii» and they had least success In planning 
and evaluation and parent Involvement. 

Preliminary Analysis of Training 

During the current school year (1972-73) CEMREL is investigating 
the consequences of different levels of training on Implementation of V.m 
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DARCEE program (Johnson, )973)- Part of this investigation Involves three 
separate ratings of the pilot test classrooms with the assessment scale. 
These ratings are given at the beginning, middle, and end of the school 
year. Figure 5 shows the beginning of year average ratings for four classes 
with maximum DARCEE training (preservice, inservice, home visitor, and 
training materials), for five classes v/ith minimum training (materl als only), 
and for eight comparison classes with no DARCEE training. The data In 
Figure 5 were collected during the first administration of the assessment 
scale . 

As Figure 5 showSj classrooms with maximum training scored on 
the average approximately )0 per cent higher on each of the essentials than 
did the classes with materials only. With the comparison classes, however, 
we don^t find the same consistency. On the essentials of physical setting, 
unit use, and parent involvement the comparison classrooms actually scored 
higher than the DARCEE group with the maximum training, on two other es- 
sentials (reinforcement and behavior manc^gement and attitude development) 
and on student Involvement they scored higher than che DARCEE classrooms 
with the minimum training, whereas on the essentials of skill development, 
organization and use of time, grouping, teacher roles and responsibilities, 
and teacher preparation these classes scored lawer than both DARCEE class- 
room treatments. 

Imp! I cat ions 

/The kind of instrument described In this paper certainly has some 
limitations. An observer who Is busy scoring items on "The DARCEE Classroom 
Assessment Scale" is less likely to discover subtle differences that exist 
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from teacher to teacher In how they structure activities, ask questions, 
and react to student behaviors than would be a non-partI cI pant observer 
who spends much tinve In classrooms focusing on such phenomena. Moreover, 
some of the Items were less reliable than we had hoped. 

Despite such problems, instruments such as this one may be used 
by people ^ . a relatively short period of training to answer a variety 
of questIonf>, such as: 

1. Was the program being evaluated actually used? 

2. Which of the program's components have shown themselves to 
be most difficult or easy to Implement? 

3. The answer to Question 2 may be used co evaluate the success 
of prior training efforts and modify future training plans. 

k. Analysis of subscores in relation to child outcomes could 

help test the developer's hypothesized relationships between 
program elements and program outcomes. 
A final and by no means minor value of developing such instruments 
IS that developers and ev/aluators In the process will portray programs in 
concrete terms.^jlf program portrayal Is a major function of evaluation 
as Stake (1972) has suggested, certainly an effort like this one to specify 
Items that are designed to determine the degree of implementation is a con- 
structive move in that direction. 



Of course, there Is a danger that the effort to develop such an 
instrument could Impose a rigidity In the thinking of program developers 
and trainers that could have undesirable consequences If the development of 
the Instrument occurs before the developers have decided what alternative 
teaching behaviors they would regard as acceptable to their program. 
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